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Most proteins are regulated by posttranslational modifications and changes in these 
modifications contribute to evolutionary changes as well as to human diseases. Phos- 
phorylation of serines, threonines, and tyrosines are the most common modifications 
identified to date in eukaryotic proteomes. While the mode of action and the function 
of most phosphorylation sites remain unknown, functional studies have shown that 
phosphorylation affects protein stability, localization and ability to interact. Two broad modes 
of action have been described for protein phosphorylation. The first mode corresponds to 
the canonical and qualitative view whereby single phosphorylation sites act as molecular 
switches that either turn on or off specific protein functions through direct or allosteric 
effects. The second mode is more akin to a rheostat than a switch. In this case, a group of 
phosphorylation sites in a given protein region contributes collectively to the modification 
of the protein, irrespective of the precise position of individual sites, through an aggregate 
property. Here we discuss these two types of regulation and examine how they affect the 
rate and patterns of protein phosphorylation evolution. We describe how the evolution of 
clusters of phosphorylation sites can be studied under the framework of complex traits 
evolution and stabilizing selection. 

Keywords: protein phosphorylation, evolutionary turnover, molecular switches, molecular rheostats, protein 
evolution, molecular evolution, cell signaling 



INTRODUCTION 

The rate of discovery of new protein forms is increasing with the 
growing sensitivity of biochemical, analytical and bioinformatics 
tools (Smith et al., 2013). We now contemplate the idea that a large 
fraction of biological diversity originates in mechanisms that reg- 
ulate protein expression and functions posttranscriptionally and 
posttranslationally. Among the major sources of posttranslational 
regulation and cellular complexity are posttranslational modifi- 
cations (PTMs; Jensen, 2006), which are additions of peptides, 
chemical groups or other complex molecules to proteins that mod- 
ify their activity, stability, degradation, localization, and ability to 
interact ( Sprang etal, 1988; Madeo etal, 1998; Vazquez etal, 
2000; Khmelinskii etal., 2009). Hundreds of such modifications 
have been reported in the literature and some of these appear 
to be playing dominant roles, at least in terms of occurrence. 
Protein phosphorylation on serines, threonines, and tyrosines 
dominates by an order of magnitude the number of experimen- 
tal PTMs recorded in common databases (Khoury etal, 2011; Lu 
etal., 2013). This domination likely derives from biases result- 
ing from the long historical interest for protein phosphorylation 
(Cohen, 1982), from the more advanced state of experimental 
identification methods for these PTMs and also for biological rea- 
sons, for instance because of the large number of protein kinases 
in eukaryotic genomes that can perform the necessary enzymatic 



reactions. The impact of protein phosphorylation on the regu- 
lation and deregulation of protein functions in human diseases 
such as Alzheimer's disease (Grundke-Iqbal etal., 1986; Alonso 
etal, 1996), and the numerous gains and losses of phosphory- 
lation in cancer cells (Reimand and Bader, 2013; Reimand etal., 
2013) suggest that it plays a major role in proteome regulation 
and in complex cellular phenotypes. For these reasons, there has 
been much interest in the recent years for understanding how 
these PTMs evolve (Moses and Landry, 2010). However, the study 
of phosphorylation site evolution has met several difficulties that 
derive from the complex mapping between PTMs and protein 
functions. 

MOLECULAR MECHANISMS OF PROTEIN REGULATION BY 
PHOSPHORYLATION 

Phosphorylation sites play roles in allosteric and orthosteric regu- 
lation of proteins (Nussinov et al., 2012). Allosteric regulation acts 
through long-distances and involves a conformational change of 
the protein, while orthosteric regulation occurs at the active site 
of an enzyme or at the interface between a protein and another 
molecule. The study of protein phosphorylation has been his- 
torically centered on the role of individual phosphorylation sites. 
Indeed, a single phosphorylation site may have dramatic effects 
in regulating protein functions. For instance, the phosphorylation 
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of tyrosine 527 on the protein kinase and oncogene Src inacti- 
vates the protein through the interaction of this modified residue 
with its SH2 domain, which closes the kinase into an inactive con- 
formation (Frame, 2002). Because single phosphorylation sites 
in specific cases play key roles in protein regulation, mutations at 
these sites may have complex organismal phenotypes. For instance, 
mutation of Ser47 on the Drosophila circadian clock protein PER 
modifies its interactions with other circadian proteins and length- 
ens adult locomotor activity from 24 to 31 h (Blau, 2008; Chiu 
etal., 2008). It has become clear that the simplistic view of one 
phosphorylation site - one function cannot be generalized to all 
phosphosites. Proteins are often multi phosphorylated and the 
different sites may affect each other's functions (Cohen, 2000). 

Multisite phosphorylation is the premise of more complex 
types of regulation, for instance concerted regulation and modular 
regulation (reviewed in Salazar and Hofer, 2007). In the first case, 
all phosphorylation sites on a protein regulate one or more protein 
functions in a concerted manner (Figure 1). In the second case, 
groups of phosphorylation sites are organized into modules of 
multiple sites found in a short distance in a particular domain 
or disordered region of a protein and each cluster regulates a 
particular and independent function. Each of the mechanisms 
described above has its own complexity in terms of effects on the 



protein and the dynamics of activation, and may thus affect the 
evolution of these sites (Figure 1). It is important to note that 
these mechanisms are not mutually exclusive. They are typically 
combined with various types of logic to encode patterns of sub- 
strate protein activity. For example, phosphorylation of some sites 
might "prime" (or increase the probability of phosphorylation) of 
other sites within the same region through the binding of other 
proteins that enhance the efficiency of the kinase (Cohen, 2000; 
Koivomagi etal., 2013; McGrath etal, 2013). In other cases, sev- 
eral sequential regulatory steps have been shown to be required 
for protein function (Yuan et al, 2002). 

Multisite phosphorylation is associated with complex dynamic 
responses in signaling cascades. The number of sites being phos- 
phorylated and their chronological order of modification can lead 
to graded or switch-like responses (Figure 1 ) , depending on several 
parameters such as enzyme and substrate concentrations, bind- 
ing parameters and the kinase/phosphatase processivity (Salazar 
and Hofer, 2007). One example of graded response involves the 
enhancement of p53 binding to CREB-binding protein (CBP). The 
p53 transactivation domain mediates the interaction with CBP 
and HDM2. Phosphorylation of Thrl8 in this domain regulates 
the qualitative binding (on/off) of HDM2. However, the bind- 
ing to CBP is regulated in a graded manner, with phosphorylation 
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FIGURE 1 |The relationship between site phosphorylation, 
localization and protein functions determines how much 
conservation is expected among species under purifying or 
stabilizing selection. (A) Toy examples of phosphorylation sites 
(indicated as "P"s) and cluster of sites and how they may affect 
protein functions individually or collectively. Phosphorylation sites 
regulate three putative functions A, B, C. The aggregate function of 
phosphorylation sites affects the fitness function of the protein and 
thus determines how many possible equivalent genotypes may give rise 
to equivalent functions or fitness. Only few possible examples are 
shown to illustrate the complex relationships expected and their impact 
on the evolution of phosphorylation profiles and many more are 
possible. (B) Shows a possible fitness landscape for CDK inhibition of 



Ste5. Ste5 inhibition is proportional to the charge (twice the number of 
phosphorylated residues) in the disordered region surrounding the PM 
domain. Evolutionary changes that create CDK consensus sites (IST1-P) 
will increase the strength of the inhibition, while changes that destroy 
consensus sites will reduce the strength of inhibition. The stabilizing 
selection model suggests that as long as the total strength of inhibition 
is within an acceptable range, the exact number and location of 
phosphorylation sites will drift nearly neutrally. A sequence alignment of 
the disordered regions surrounding the PM domain of Ste5 from S. 
cerevisiae and related yeasts is shown on the right. During evolution 
consensus sites are gained and lost (+ [ST]-P or - [ST]-P) on the 
phylogenetic tree, leading to a large diversity in number and location of 
phosphorylation sites in this region. 
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events contributing additively to the binding energy of p53 to CBP 
(Lee etal., 2010). Accordingly, it is the sum of the effect that pro- 
vides an appropriate function to the sites. Another recent example 
of aggregate effect comes from the circadian rhythm protein FRQ 
in Neurospora, which has more than 100 phosphorylation sites. 
It was recently demonstrated that the phosphorylation of FRQ 
is progressive and leads to a buildup of charge in one region of 
the protein that eventually leads to its degradation (Menet and 
Rosbash, 2011; Querfurth etal, 2011). A similar mechanism was 
found to control membrane association of Ste5 in the yeast mat- 
ing pathway (Serber and Ferrell, 2007). The feature of interest 
here is that each phosphorylation site does not have a precise role 
but rather contributes to an aggregate property. The aggregate 
property was demonstrated experimentally in only few cases but 
clusters of phosphorylation sites are so abundant in proteomes 
that a large fraction of proteins could be regulated this way. Large- 
scale studies have indeed found that phosphorylation sites tend 
to localize in dense clusters of serines and threonines, which often 
tend to be phosphorylated by the same kinases (Moses et al., 2007a; 
Schweiger and Linial, 2010), supporting the hypothesis that a frac- 
tion of phosphorylation sites could regulate protein functions in a 
concerted way rather than acting as individual switches. However, 
these large-scale datasets need to be interpreted with caution. As 
it is the case for single phosphorylation sites (see below), clus- 
ters of phosphorylation sites could also appear as a result of 
non-functional phosphorylation. The tendency of phosphoryla- 
tion sites to cluster could in this case result from the fact that 
regions accessible to protein kinases are not randomly distributed 
and form cluster of accessible sites that are susceptible to these 
phosphorylation events. 

EVOLUTION OF PHOSPHORYLATION SITES 

The observation that a large fraction of proteins are posttrans- 
lationally modified led to the hypothesis that changes in PTM 
may contribute to a large fraction of phenotypic variation within 
species and divergence among species (Moses and Landry, 2010). 
Many studies have thus examined the conservation and divergence 
of PTMs, particularly phosphorylation sites. Given the diversity 
of molecular functions that can be regulated by phosphorylation 
sites either individually, or when they are found in a multi-site 
phosphorylated protein, it is expected that their patterns of evolu- 
tion would also be diverse. For example, in the classical paradigm 
where a single phosphorylation event leads to a conformation 
change that alters enzyme activity (Cohen, 1982), one predicts 
that the phosphorylation site and its position are conserved by 
purifying selection, as long as the regulation of that enzyme 
activity is important. Indeed, some well-characterized examples 
are highly conserved sites over evolution (Rittenhouse et al, 1986; 
Landry etal, 2009; Tan etal, 2009; Beltrao etal, 2012). How- 
ever, surprisingly, other well-characterized phosphorylation sites 
that regulate enzyme activity are not conserved (Hwang and Flet- 
terick, 1986; Landry etal, 2009; Boulais etal., 2010; Nguyen Ba 
and Moses, 2010; Freschi etal., 2011, 2014; Levy etal, 2012) sug- 
gesting that regulation of those enzymes is not critical, or that 
other mechanisms exist that make phosphorylation sites and their 
positions unessential. While the phenotypic consequences of the 
changes in individual sites among species is still mostly unknown 



-with a few spectacular exceptions (Lynch etal, 2011) - the esti- 
mates of the relative rate of phosphorylation site evolution based 
on large samples led to some challenging conclusions. Some stud- 
ies concluded that phosphorylation sites are generally under strong 
evolutionary constraints, i.e., that they evolve much slower than 
non-modified residues (e.g., Gray and Kumar, 2011), while others 
estimated that the average constraint imposed on proteins by their 
phosphorylation is relatively weak (e.g., Landry et al., 2009; studies 
reviewed in Levy et al, 2012). 

Beyond the fact that variation in the different methods and 
datasets may contribute to some of the disagreements in estimat- 
ing phosphorylation site conservation, much of the debate comes 
from the fact that some authors focused their attention on cases 
where purifying selection is strong whereas others focused on cases 
where there is no or little purifying selection. On the first hand, in 
cases where phosphorylation is known to play an important role 
in regulating the protein, phosphorylation sites are often strongly 
conserved as predicted (Landry et al., 2009; Nguyen Ba and Moses, 
2010; Beltrao etal., 2012). On the other hand, the more challeng- 
ing observation is that large numbers of uncharacterized sites are 
poorly conserved among species. There are several possible rea- 
sons why sites would evolve quickly. The first is that databases 
reporting large-scale data on phosphorylation may be populated 
with a significant fraction of false-positive identifications, i.e., sites 
that are not actually phosphorylated in cells. Although this lim- 
itation contributes little to our understanding of the evolution 
of phosphorylation sites, it will be an important challenge to be 
addressed by investigators developing instruments and analytical 
tools. Another possible scenario is that the rate of evolution is 
elevated because a significant fraction of sites are species-specific, 
i.e., they have evolved only recently by directional selection and 
tests of conservation among species reject the hypothesis that they 
are under evolutionary constraint. While this hypothesis is of bio- 
logical interest, there is currently very little data supporting this 
possibility (but see Jensen et al., 2006; Kim and Hahn, 201 1; Lynch 
etal, 2011), at least not on a scale that would affect significantly 
the results of analyses performed on thousands of phosphorylation 
sites. 

One other possible mechanism that would explain why many 
phosphorylation sites evolve as non-modified residues would be 
that they provide no function to the protein, i.e., they result 
from non-functional encounters between kinases and their sub- 
strates (Lienhard, 2008; Landry etal, 2009). Because kinases are 
highly processive enzymes and their recognition motifs are highly 
degenerate (Ubersax and Ferrell, 2007), many collisions between 
kinases and proteins may result in the inconsequential phospho- 
rylation of residues. This cause is obviously extremely difficult to 
demonstrate because it is almost impossible to show that some 
trait or molecular feature has no function, as it is impossible 
to test all possible parameters that could reveal its role experi- 
mentally. However, there are lines of evidence that support this 
model. For instance, bona fide functional phosphorylation sites 
are more conserved than the ones for which no evidence is avail- 
able (Landry et al., 2009; Nguyen Ba and Moses, 20 10; Freschi et al., 
2014). In addition, the relative rate of phosphorylation site con- 
servation decreases with protein abundance and increases with the 
stoichiometry of phosphorylation, another observation consistent 
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with a model by which the prevalence of non-functional phospho- 
rylation increases with protein abundance, due to the increased 
probability of encounters between kinases and other proteins 
(Levyet al.,2012). However, this observation is also consistent with 
a higher rate of false positive phosphorylation sites being identi- 
fied in mass spectrometry studies for high abundance proteins. We 
note that two of these explanations (false positives, non-functional 
encounters) posit no or limited biological significance to poorly 
conserved phosphorylation sites, whereas species specific direc- 
tional selection would indicate great biological significance for 
these sites. A final possible explanation for the elevated rate of evo- 
lution of phosphorylation site is the rapid turnover of sites caused 
by the weak constraint on their localization on the protein. This 
is the case that we consider further here, where rapidly evolving 
phosphorylation sites do have biological functions, but function 
is not strongly dependent on each individual site, as discussed 
above in Section "Molecular Mechanisms of Protein Regulation 
by Phosphorylation." 

STABILIZING SELECTION ACTING ON CLUSTERS OF 
PHOSPHORYLATION SITES 

The turnover of phosphorylation sites found in clusters of sites is 
an appealing hypothesis to explain at least a fraction of rapidly 
evolving phosphorylation sites (Landry etal., 2009). Indeed, a 
large fraction of phosphorylation sites are overrepresented in dis- 
ordered regions of proteins where they occur in clusters of several 
juxtaposed sites (Moses et al., 2007a; Landry et al., 2009; Schweiger 
and Linial, 2010) and could act as groups rather than individual 
sites as described in Section "Molecular Mechanisms of Protein 
Regulation by Phosphorylation" (Figure 1). Sites could be gained 
and lost within a given region with limited effect on the over- 
all function, but contributing to an increased rate of evolution 
if one considers sites individually. Different clusters could also 
play similar roles and diverging species could lose and gain these 
clusters without any significant change in function (Drury and 
Diffley, 2009). This model, which represents a previously unde- 
scribed form of "stabilizing selection" (Burger and Lande, 1994; 
Charlesworth, 2013a,b), suggests that clusters of sites rather than 
the sites themselves are the functional unit. This model leads to 
several predictions that can be tested using existing data: (i) pro- 
tein functions regulated by phosphorylation should be preserved 
despite the fact that phosphorylation sites move positions; (ii) 
phosphorylated residues that occur in clusters should evolve faster 
because any change in position would be identified as an evolu- 
tionary modification; (iii) the number rather than the position 
of phosphorylation sites within clusters should be preserved over 
evolution; (iv) sites that are gained and lost between orthologs 
proteins should be gained and lost within these clusters. While 
there are no clear experimental data supporting most of these pre- 
dictions, there are several observations that are consistent with 
them. 

Observations supporting the first prediction come from pro- 
teins involved in DNA replication that are regulated by cyclin 
dependent kinases (CDKs). These proteins show conserved 
regulation in animals and fungi but diverge in the position and 
number of clusters of CDK phosphorylation sites, suggesting that 
functions may be preserved despite changes in phosphorylation 



patterns (Moses et al., 2007b). Additional evidence comes from the 
large-scale study of CDK-dependent sites in budding yeasts where 
it was suggested that phosphorylation sites present in the model 
species but absent in closely related species could in fact be present 
in these other species but at other positions (Holt et al., 2009). The 
second prediction has been tested by (Nguyen Ba and Moses, 2010) 
who examined whether gains and losses of phosphorylation sites 
were more permissive in proteins with a high number of phospho- 
rylation sites but no significant evidence was found against the null 
hypothesis. Evidence supporting predictions (iii) and (iv) comes 
from the analysis of yeast and mammalian phosphoproteomes. In 
the first case, it was shown that despite the low rate of phospho- 
rylation site conservation (here the phosphorylation status of the 
proteins was compared between species or between paralogs), the 
actual number of phosphorylation sites tended to be maintained 
over evolution between homologous proteins (Freschi et al., 201 1) 
or group of proteins (Beltrao etal, 2009). Similarly, orthologs of 
bona fide targets of protein kinases were found to contain con- 
served clusters of phosphorylation sites, often with little amino 
acid sequence conservation (Lai et al, 2012). Finally, it was shown 
that birth and death of phosphorylation sites in proteins tended 
to be clustered in space (Freschi etal., 2011, 2014), supporting the 
last prediction of the model. 

More work will be needed along these lines to test whether 
the stabilizing selection model can apply to clusters of phos- 
phorylation sites. Under this model, a phosphorylation cluster 
would be considered as a quantitative trait where selection does 
not act on a particular site but rather on the number or den- 
sity of sites. In this model, each site represents a single locus 
that contributes to a "complex trait." Deviation from the trait 
optimum is associated with a fitness cost that would increase or 
decrease following a given function (see, e.g., Charlesworth, 20 13a) 
when phosphorylation sites are added or removed by mutations 
in this region (Figure 1). As mutation-selection-drift equilibrium 
is reached after a long divergence time, there will be a large diver- 
sity of "genotypes" that encode the trait. Even if the optimal trait 
value is the same in all species, if the number of phosphoryla- 
tion sites (loci) that contribute to the trait is large, a minimal 
level of conservation will be observed for the actual phospho- 
rylation sites. For example, CDK-mediated inhibition of Ste5 
membrane localization is mediated by 8 phosphorylation sites in 
S. cerevisiae (Figure 1; Serber and Ferrell, 2007; Strickfaden etal., 
2007). The strength of the inhibition is proportional to the net 
charge of the Ste5 PM domain after phosphorylation by CDKs. 
In evolutionary terms, the net charge of the PM domain is a 
quantitative trait whose value simply doubles the sum of the 
CDK phosphorylation sites in the PM domain. If a mutation 
occurs at a phosphorylation site, and renders it unphosphory- 
latable, the net charge of the PM domain will decrease by 2. On 
the other hand, if a mutation creates a new phosphorylation site 
(by adding a serine or threonine followed by a proline) in this 
region, the value of the trait will increase by 2. Assuming that 
there are several values of the trait that are compatible with func- 
tional inhibition by CDK, the PM domain will be free to drift in 
sequence space through any "genotype" that has enough phos- 
phorylation sites to retain function. At mutation-selection-drift 
equilibrium, a large number of genotypes will coexist (Figure 1), 
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corresponding to different positions and numbers of phosphory- 
lation sites in the PM domain. This stabilizing selection model 
has been applied to explain the rapid evolution of transcription 
factor binding sites in highly conserved developmental enhancers 
in drosophila (Ludwig etal., 2000). In one recent simulation 
study, clusters of transcription factor binding sites were shown 
to evolve spontaneously simply because there are a much larger 
number of genotypes with many binding sites that encode the 
trait than "simple" genotypes with few binding sites (He etal., 
2012). Whether this type of model can explain the origin of phos- 
phorylation site clusters is currently a promising open research 
question. 

CONCLUSION 

Given the current knowledge on the function of clusters of phos- 
phorylation sites, there is a need for the development of new 
evolutionary models or adaptation of existing ones that take into 
account the fuzziness of phosphorylation site position and density. 
Although to our knowledge the stabilizing selection model has not 
been explicitly applied to the evolution of phosphorylation sites, 
the distribution of kinase site consensus matches in the absence 
of selection has been calculated (Lai etal, 2012). This represents 
a neutral null hypothesis for the evolution of phosphorylation 
site clusters. Deviations from this model were exploited to predict 
substrates for several protein kinases because orthologs of bona 
fide substrates contain more matches to the phosphorylation site 
consensus than expected in the absence of selection. Knowledge 
on kinase recognition motifs can thus be exploited to examine the 
neutral evolution of phosphorylation sites and thus to derive null 
hypothesis regarding their evolution. 

The study of protein phosphorylation will also need more 
experimental studies on the function of cluster of phosphorylation 
sites in order to learn the general principles by which their aggre- 
gate properties emerge from their combination. This will allow 
for instance to estimate the relationship between the number of 
sites, their density and positions, and protein function and organ- 
ismal fitness. Most evolutionary studies performed so far, with few 
exceptions (Landry etal, 2009), rely on comparative data among 
species and not on within species polymorphisms. Detailed anal- 
ysis of within species variation coupled with functional studies 
could help estimate the distribution of genotypes under selection- 
mutation-drift equilibrium for given clusters of sites. Above all, 
the adaptation of more complex models to the evolution of PTMs 
will provide a better global picture of the evolutionary forces act- 
ing on phosphoproteomes. At the same time, these models will 
have the potential to contribute to biochemical studies whereby 
evolutionary observations will guide experimental investigators in 
studying the code of phosphorylation site regulation. 
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